Coding systems

ABSTRACT

In an implementation, a supplemental sequence parameter set (“SPS”) structure is provided that has its own network abstraction layer (“NAL”) unit type and allows transmission of layer-dependent parameters for non-base layers in an SVC environment. The supplemental SPS structure also may be used for view information in an MVC environment. In a general aspect, a structure is provided that includes (1) information ( 1410 ) from an SPS NAL unit, the information describing a parameter for use in decoding a first-layer encoding of a sequence of images, and (2) information ( 1420 ) from a supplemental SPS NAL unit having a different structure than the SPS NAL unit, and the information from the supplemental SPS NAL unit describing a parameter for use in decoding a second-layer encoding of the sequence of images. Associated methods and apparatuses are provided on the encoder and decoder sides, as well as for the signal.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a continuation of co-pending U.S. application Ser.No. 12/450,868 filed Mar. 5, 2010, herein incorporated by reference.This application claims the benefit, under 35 U.S.C. §365 ofInternational Application PCT/US2008/004530 filed Apr. 7, 2008, whichwas published in accordance with PCT Article 21(2) on Oct. 30, 2008 inEnglish and which claims the benefit of U.S. provisional patentapplication No. 60/923,993 filed Apr. 18, 2007 and U.S. patentapplication Ser. No. 11/824,006 filed Jun. 28, 2007.

TECHNICAL FIELD

At least one implementation relates to encoding and decoding video datain a scalable manner.

BACKGROUND

Coding video data according to several layers can be useful whenterminals for which data are intended have different capacities andtherefore do not decode a full data stream but only part of a full datastream. When the video data are coded according to several layers in ascalable manner, the receiving terminal can extract from the receivedbit-stream a portion of the data according to the terminal's profile. Afull data stream may also transmit overhead information for eachsupported layer, to facilitate decoding of each of the layers at aterminal.

SUMMARY

According to a general aspect, information is accessed from a sequenceparameter set (“SPS”) network abstraction layer (“NAL”) unit. Theinformation describes a parameter for use in decoding a first-layerencoding of a sequence of images. Information is also accessed from asupplemental SPS NAL unit having a different structure than the SPS NALunit. The information from the supplemental SPS NAL unit describes aparameter for use in decoding a second-layer encoding of the sequence ofimages. A decoding of the sequence of images is generated based on thefirst-layer encoding, the second-layer encoding, the accessedinformation from the SPS NAL unit, and the accessed information from thesupplemental SPS NAL unit.

According to another general aspect, a syntax structure is used thatprovides for decoding a sequence of images in multiple layers. Thesyntax structure includes syntax for an SPS NAL unit includinginformation describing a parameter for use in decoding a first-layerencoding of a sequence of images. The syntax structure also includessyntax for a supplemental SPS NAL unit having a different structure thanthe SPS NAL unit. The supplemental SPS NAL unit includes informationdescribing a parameter for use in decoding a second-layer encoding ofthe sequence of images. A decoding of the sequence of images may begenerated based on the first-layer encoding, the second-layer encoding,the information from the SPS NAL unit, and the information from thesupplemental SPS NAL unit.

According to another general aspect, a signal is formatted to includeinformation from an SPS NAL unit. The information describes a parameterfor use in decoding a first-layer encoding of a sequence of images. Thesignal is further formatted to include information from a supplementalSPS NAL unit having a different structure than the SPS NAL unit. Theinformation from the supplemental SPS NAL unit describes a parameter foruse in decoding a second-layer encoding of the sequence of images.

According to another general aspect, a SPS NAL unit is generated thatincludes information describing a parameter for use in decoding afirst-layer encoding of a sequence of images. A supplemental SPS NALunit is generated that has a different structure than the SPS NAL unit.The supplemental SPS NAL unit includes information that describes aparameter for use in decoding a second-layer encoding of the sequence ofimages. A set of data is provided that includes the first-layer encodingof the sequence of images, the second-layer encoding of the sequence ofimages, the SPS NAL unit, and the supplemental SPS NAL unit.

According to another general aspect, a syntax structure is used thatprovides for encoding a sequence of images in multiple layers. Thesyntax structure includes syntax for an SPS NAL unit. The SPS NAL unitincludes information that describes a parameter for use in decoding afirst-layer encoding of a sequence of images. The syntax structureincludes syntax for a supplemental SPS NAL unit. The supplemental SPSNAL unit has a different structure than the SPS NAL unit. Thesupplemental SPS NAL unit includes information that describes aparameter for use in decoding a second-layer encoding of the sequence ofimages. A set of data may be provided that includes the first-layerencoding of the sequence of images, the second-layer encoding of thesequence of images, the SPS NAL unit, and the supplemental SPS NAL unit.

According to another general aspect, first layer-dependent informationis accessed in a first normative parameter set. The accessed firstlayer-dependent information is for use in decoding a first-layerencoding of a sequence of images. Second layer-dependent information isaccessed in a second normative parameter set. The second normativeparameter set has a different structure than the first normativeparameter set. The accessed second layer-dependent information is foruse in decoding a second-layer encoding of the sequence of images. Thesequence of images is decoded based on one or more of the accessed firstlayer-dependent information or the accessed second layer-dependentinformation.

According to another general aspect, a first normative parameter set isgenerated that includes first layer-dependent information. The firstlayer-dependent information is for use in decoding a first-layerencoding of a sequence of images. A second normative parameter set isgenerated having a different structure than the first normativeparameter set. The second normative parameter set includes secondlayer-dependent information for use in decoding a second-layer encodingof the sequence of images. A set of data is provided that includes thefirst normative parameter set and the second normative parameter set.

The details of one or more implementations are set forth in theaccompanying drawings and the description below. Even if described inone particular manner, it should be clear that implementations may beconfigured or embodied in various manners. For example, animplementation may be performed as a method, or embodied as anapparatus, such as, for example, an apparatus configured to perform aset of operations or an apparatus storing instructions for performing aset of operations, or embodied in a signal. Other aspects and featureswill become apparent from the following detailed description consideredin conjunction with the accompanying drawings and the claims.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram for an implementation of an encoder.

FIG. 1 a is a block diagram for another implementation of an encoder.

FIG. 2 is a block diagram for an implementation of a decoder.

FIG. 2 a is a block diagram for another implementation of a decoder.

FIG. 3 is a structure of an implementation of a Single-Layer SequenceParameter Set (“SPS”) Network Abstraction Layer (“NAL”) unit.

FIG. 4 is a block view of an example of portions of a data streamillustrating use of an SPS NAL unit.

FIG. 5 is a structure of an implementation of a supplemental SPS (“SUPSPS”) NAL unit.

FIG. 6 is an implementation of an organizational hierarchy among an SPSunit and multiple SUP SPS units.

FIG. 7 is a structure of another implementation of a SUP SPS NAL unit.

FIG. 8 is a functional view of an implementation of a scalable videocoder that generates SUP SPS units.

FIG. 9 is a hierarchical view of an implementation of the generation ofa data stream that contains SUP SPS units.

FIG. 10 is a block view of an example of a data stream generated by theimplementation of FIG. 9.

FIG. 11 is a block diagram of an implementation of an encoder.

FIG. 12 is a block diagram of another implementation of an encoder.

FIG. 13 is a flow chart of an implementation of an encoding process usedby the encoders of FIG. 11 or 12.

FIG. 14 is a block view of an example of a data stream generated by theprocess of FIG. 13.

FIG. 15 is a block diagram of an implementation of a decoder.

FIG. 16 is a block diagram of another implementation of a decoder.

FIG. 17 is a flow chart of an implementation of a decoding process usedby the decoders of FIG. 15 or 16.

DETAILED DESCRIPTION

Several video coding standards exist today that can code video dataaccording to different layers and/or profiles. Among them, one can citeH.264/MPEG-4 AVC (the “AVC standard”), also referenced as theInternational Organization for Standardization/InternationalElectrotechnical Commission (ISO/IEC) Moving Picture Experts Group-4(MPEG-4) Part 10 Advanced Video Coding (AVC) standard/InternationalTelecommunication Union, Telecommunication Sector (ITU-T) H.264recommendation. Additionally, extensions to the AVC standard exist. Afirst such extension is a scalable video coding (“SVC”) extension (AnnexG) referred to as H.264/MPEG-4 AVC, scalable video coding extension (the“SVC extension”). A second such extension is a multi-view video coding(“MVC”) extension (Annex H) referred to as H.264/MPEG-4 AVC, MVCextension (the “MVC extension”).

At least one implementation described in this disclosure may be usedwith the AVC standard as well as the SVC and MVC extensions. Theimplementation provides a supplemental (“SUP”) sequence parameter set(“SPS”) network abstraction layer (“NAL”) unit having a different NALunit type than SPS NAL units. An SPS unit typically includes, but neednot, information for at least a single layer. Further, the SUP SPS NALunit includes layer-dependent information for at least one additionallayer. Thus, by accessing SPS and SUP SPS units, a decoder has availablecertain (and typically all) layer-dependent information needed to decodea bit stream.

Using this implementation in an AVC system, the SUP SPS NAL units neednot be transmitted, and a single-layer SPS NAL unit (as described below)may be transmitted. Using this implementation in an SVC (or MVC) system,the SUP SPS NAL unit(s) may be transmitted for the desired additionallayers (or views), in addition to an SPS NAL unit. Using thisimplementation in a system including both AVC-compatible decoders andSVC-compatible (or MVC-compatible) decoders, the AVC-compatible decodersmay ignore the SUP SPS NAL units by detecting the NAL unit type. In eachcase, efficiency and compatibility are achieved.

The above implementation also provides benefits for systems (standardsor otherwise) that impose a requirement that certain layers share headerinformation, such as, for example, an SPS or particular informationtypically carried in an SPS. For example, if a base layer and itscomposite temporal layers need to share an SPS, then the layer-dependentinformation cannot be transmitted with the shared SPS. However, the SUPSPS provides a mechanism for transmitting the layer-dependentinformation.

The SUP SPS of various implementations also provides an efficiencyadvantage in that the SUP SPS need not include, and therefore repeat,all of the parameters in the SPS. The SUP SPS will typically be focusedon the layer-dependent parameters. However, various implementationsinclude a SUP SPS structure that includes non-layer-dependentparameters, or even repeats all of an SPS structure.

Various implementations relate to the SVC extension. The SVC extensionproposes the transmission of video data according to several spatiallevels, temporal levels, and quality levels. For one spatial level, onecan code according to several temporal levels, and for each temporallevel according to several quality levels. Therefore, when there aredefined m spatial levels, n temporal levels, and O quality levels, thevideo data can be coded according to m*n*O different combinations. Thesecombinations are referred to as layers, or as interoperability points(“IOPs”). According to the decoder (also referred to as the receiver orthe client) capabilities, different layers may be transmitted up to acertain layer corresponding to the maximum of the client capabilities.

As used herein, “layer-dependent” information refers to information thatrelates specifically to a single layer. That is, as the name suggests,the information is dependent upon the specific layer. Such informationneed not necessarily vary from layer to layer, but would typically beprovided separately for each layer.

As used herein, “high level syntax” refers to syntax present in thebitstream that resides hierarchically above the macroblock layer. Forexample, high level syntax, as used herein, may refer to, but is notlimited to, syntax at the slice header level, Supplemental EnhancementInformation (SEI) level, Picture Parameter Set (PPS) level, SequenceParameter Set (SPS) level, and Network Abstraction Layer (NAL) unitheader level.

Referring to FIG. 1, an exemplary SVC encoder is indicated generally bythe reference numeral 100. The SVC encoder 100 may also be used for AVCencoding, that is, for a single layer (for example, base layer).Further, the SVC encoder 100 may be used for MVC encoding as one ofordinary skill in the art will appreciate. For example, variouscomponents of the SVC encoder 100, or variations of these components,may be used in encoding multiple views.

A first output of a temporal decomposition module 142 is connected insignal communication with a first input of an intra prediction for intrablock module 146. A second output of the temporal decomposition module142 is connected in signal communication with a first input of a motioncoding module 144. An output of the intra prediction for intra blockmodule 146 is connected in signal communication with an input of atransform/entropy coder (signal to noise ratio (SNR) scalable) 149. Afirst output of the transform/entropy coder 149 is connected in signalcommunication with a first input of a multiplexer 170.

A first output of a temporal decomposition module 132 is connected insignal communication with a first input of an intra prediction for intrablock module 136. A second output of the temporal decomposition module132 is connected in signal communication with a first input of a motioncoding module 134. An output of the intra prediction for intra blockmodule 136 is connected in signal communication with an input of atransform/entropy coder (signal to noise ratio (SNR) scalable) 139. Afirst output of the transform/entropy coder 139 is connected in signalcommunication with a first input of a multiplexer 170.

A second output of the transform/entropy coder 149 is connected insignal communication with an input of a 2D spatial interpolation module138. An output of 2D spatial interpolation module 138 is connected insignal communication with a second input of the intra prediction forintra block module 136. A second output of the motion coding module 144is connected in signal communication with an input of the motion codingmodule 134.

A first output of a temporal decomposition module 122 is connected insignal communication with a first input of an intra predictor 126. Asecond output of the temporal decomposition module 122 is connected insignal communication with a first input of a motion coding module 124.An output of the intra predictor 126 is connected in signalcommunication with an input of a transform/entropy coder (signal tonoise ratio (SNR) scalable) 129. An output of the transform/entropycoder 129 is connected in signal communication with a first input of amultiplexer 170.

A second output of the transform/entropy coder 139 is connected insignal communication with an input of a 2D spatial interpolation module128. An output of 2D spatial interpolation module 128 is connected insignal communication with a second input of the intra predictor module126. A second output of the motion coding module 134 is connected insignal communication with an input of the motion coding module 124.

A first output of the motion coding module 124, a first output of themotion coding module 134, and a first output of the motion coding module144 are each connected in signal communication with a second input ofthe multiplexer 170.

A first output of a 2D spatial decimation module 104 is connected insignal communication with an input of the temporal decomposition module132. A second output of the 2D spatial decimation module 104 isconnected in signal communication with an input of the temporaldecomposition module 142.

An input of the temporal decomposition module 122 and an input of the 2Dspatial decimation module 104 are available as inputs of the encoder100, for receiving input video 102.

An output of the multiplexer 170 is available as an output of theencoder 100, for providing a bitstream 180.

The temporal decomposition module 122, the temporal decomposition module132, the temporal decomposition module 142, the motion coding module124, the motion coding module 134, the motion coding module 144, theintra predictor 126, the intra predictor 136, the intra predictor 146,the transform/entropy coder 129, the transform/entropy coder 139, thetransform/entropy coder 149, the 2D spatial interpolation module 128,and the 2D spatial interpolation module 138 are included in a coreencoder portion 187 of the encoder 100.

FIG. 1 includes three core encoders 187. In the implementation shown,the bottom-most core encoder 187 may encode a base layer, with themiddle and upper core encoders 187 encoding higher layers.

Turning to FIG. 2, an exemplary SVC decoder is indicated generally bythe reference numeral 200. The SVC decoder 200 may also be used for AVCdecoding, that is, for a single view. Further, the SVC decoder 200 maybe used for MVC decoding as one of ordinary skill in the art willappreciate. For example, various components of the SVC decoder 200, orvariations of these components, may be used in decoding multiple views.

Note that encoder 100 and decoder 200, as well as other encoders anddecoders discussed in this disclosure, can be configured to performvarious methods shown throughout this disclosure. In addition toperforming encoding operations, the encoders described in thisdisclosure may perform various decoding operations during areconstruction process in order to mirror the expected actions of adecoder. For example, an encoder may decode SUP SPS units to decodeencoded video data in order to produce a reconstruction of the encodedvideo data for use in predicting additional video data. Consequently, anencoder may perform substantially all of the operations that areperformed by a decoder.

An input of a demultiplexer 202 is available as an input to the scalablevideo decoder 200, for receiving a scalable bitstream. A first output ofthe demultiplexer 202 is connected in signal communication with an inputof a spatial inverse transform SNR scalable entropy decoder 204. A firstoutput of the spatial inverse transform SNR scalable entropy decoder 204is connected in signal communication with a first input of a predictionmodule 206. An output of the prediction module 206 is connected insignal communication with a first input of a combiner 230.

A second output of the spatial inverse transform SNR scalable entropydecoder 204 is connected in signal communication with a first input of amotion vector (MV) decoder 210. An output of the MV decoder 210 isconnected in signal communication with an input of a motion compensator232. An output of the motion compensator 232 is connected in signalcommunication with a second input of the combiner 230.

A second output of the demultiplexer 202 is connected in signalcommunication with an input of a spatial inverse transform SNR scalableentropy decoder 212. A first output of the spatial inverse transform SNRscalable entropy decoder 212 is connected in signal communication with afirst input of a prediction module 214. A first output of the predictionmodule 214 is connected in signal communication with an input of aninterpolation module 216. An output of the interpolation module 216 isconnected in signal communication with a second input of the predictionmodule 206. A second output of the prediction module 214 is connected insignal communication with a first input of a combiner 240.

A second output of the spatial inverse transform SNR scalable entropydecoder 212 is connected in signal communication with a first input ofan MV decoder 220. A first output of the MV decoder 220 is connected insignal communication with a second input of the MV decoder 210. A secondoutput of the MV decoder 220 is connected in signal communication withan input of a motion compensator 242. An output of the motioncompensator 242 is connected in signal communication with a second inputof the combiner 240.

A third output of the demultiplexer 202 is connected in signalcommunication with an input of a spatial inverse transform SNR scalableentropy decoder 222. A first output of the spatial inverse transform SNRscalable entropy decoder 222 is connected in signal communication withan input of a prediction module 224. A first output of the predictionmodule 224 is connected in signal communication with an input of aninterpolation module 226. An output of the interpolation module 226 isconnected in signal communication with a second input of the predictionmodule 214.

A second output of the prediction module 224 is connected in signalcommunication with a first input of a combiner 250. A second output ofthe spatial inverse transform SNR scalable entropy decoder 222 isconnected in signal communication with an input of an MV decoder 230. Afirst output of the MV decoder 230 is connected in signal communicationwith a second input of the MV decoder 220. A second output of the MVdecoder 230 is connected in signal communication with an input of amotion compensator 252. An output of the motion compensator 252 isconnected in signal communication with a second input of the combiner250.

An output of the combiner 250 is available as an output of the decoder200, for outputting a layer 0 signal. An output of the combiner 240 isavailable as an output of the decoder 200, for outputting a layer 1signal. An output of the combiner 230 is available as an output of thedecoder 200, for outputting a layer 2 signal.

Referring to FIG. 1 a, an exemplary AVC encoder is indicated generallyby the reference numeral 2100. The AVC encoder 2100 may be used, forexample, for encoding a single layer (for example, base layer).

The video encoder 2100 includes a frame ordering buffer 2110 having anoutput in signal communication with a non-inverting input of a combiner2185. An output of the combiner 2185 is connected in signalcommunication with a first input of a transformer and quantizer 2125. Anoutput of the transformer and quantizer 2125 is connected in signalcommunication with a first input of an entropy coder 2145 and a firstinput of an inverse transformer and inverse quantizer 2150. An output ofthe entropy coder 2145 is connected in signal communication with a firstnon-inverting input of a combiner 2190. An output of the combiner 2190is connected in signal communication with a first input of an outputbuffer 2135.

A first output of an encoder controller 2105 is connected in signalcommunication with a second input of the frame ordering buffer 2110, asecond input of the inverse transformer and inverse quantizer 2150, aninput of a picture-type decision module 2115, an input of amacroblock-type (MB-type) decision module 2120, a second input of anintra prediction module 2160, a second input of a deblocking filter2165, a first input of a motion compensator 2170, a first input of amotion estimator 2175, and a second input of a reference picture buffer2180.

A second output of the encoder controller 2105 is connected in signalcommunication with a first input of a Supplemental EnhancementInformation (“SEI”) inserter 2130, a second input of the transformer andquantizer 2125, a second input of the entropy coder 2145, a second inputof the output buffer 2135, and an input of the Sequence Parameter Set(SPS) and Picture Parameter Set (PPS) inserter 2140.

A first output of the picture-type decision module 2115 is connected insignal communication with a third input of a frame ordering buffer 2110.A second output of the picture-type decision module 2115 is connected insignal communication with a second input of a macroblock-type decisionmodule 2120.

An output of the Sequence Parameter Set (“SPS”) and Picture ParameterSet (“PPS”) inserter 2140 is connected in signal communication with athird non-inverting input of the combiner 2190. An output of the SEIInserter 2130 is connected in signal communication with a secondnon-inverting input of the combiner 2190.

An output of the inverse quantizer and inverse transformer 2150 isconnected in signal communication with a first non-inverting input of acombiner 2127. An output of the combiner 2127 is connected in signalcommunication with a first input of the intra prediction module 2160 anda first input of the deblocking filter 2165. An output of the deblockingfilter 2165 is connected in signal communication with a first input of areference picture buffer 2180. An output of the reference picture buffer2180 is connected in signal communication with a second input of themotion estimator 2175 and with a first input of a motion compensator2170. A first output of the motion estimator 2175 is connected in signalcommunication with a second input of the motion compensator 2170. Asecond output of the motion estimator 2175 is connected in signalcommunication with a third input of the entropy coder 2145.

An output of the motion compensator 2170 is connected in signalcommunication with a first input of a switch 2197. An output of theintra prediction module 2160 is connected in signal communication with asecond input of the switch 2197. An output of the macroblock-typedecision module 2120 is connected in signal communication with a thirdinput of the switch 2197 in order to provide a control input to theswitch 2197. An output of the switch 2197 is connected in signalcommunication with a second non-inverting input of the combiner 2127 andwith an inverting input of the combiner 2185.

Inputs of the frame ordering buffer 2110 and the encoder controller 2105are available as input of the encoder 2100, for receiving an inputpicture 2101. Moreover, an input of the SEI inserter 2130 is availableas an input of the encoder 2100, for receiving metadata. An output ofthe output buffer 2135 is available as an output of the encoder 2100,for outputting a bitstream.

Referring to FIG. 2 a, a video decoder capable of performing videodecoding in accordance with the MPEG-4 AVC standard is indicatedgenerally by the reference numeral 2200.

The video decoder 2200 includes an input buffer 2210 having an outputconnected in signal communication with a first input of an entropydecoder 2245. A first output of the entropy decoder 2245 is connected insignal communication with a first input of an inverse transformer andinverse quantizer 2250. An output of the inverse transformer and inversequantizer 2250 is connected in signal communication with a secondnon-inverting input of a combiner 2225. An output of the combiner 2225is connected in signal communication with a second input of a deblockingfilter 2265 and a first input of an intra prediction module 2260. Asecond output of the deblocking filter 2265 is connected in signalcommunication with a first input of a reference picture buffer 2280. Anoutput of the reference picture buffer 2280 is connected in signalcommunication with a second input of a motion compensator 2270.

A second output of the entropy decoder 2245 is connected in signalcommunication with a third input of the motion compensator 2270 and afirst input of the deblocking filter 2265. A third output of the entropydecoder 2245 is connected in signal communication with an input of adecoder controller 2205. A first output of the decoder controller 2205is connected in signal communication with a second input of the entropydecoder 2245. A second output of the decoder controller 2205 isconnected in signal communication with a second input of the inversetransformer and inverse quantizer 2250. A third output of the decodercontroller 2205 is connected in signal communication with a third inputof the deblocking filter 2265. A fourth output of the decoder controller2205 is connected in signal communication with a second input of theintra prediction module 2260, with a first input of the motioncompensator 2270, and with a second input of the reference picturebuffer 2280.

An output of the motion compensator 2270 is connected in signalcommunication with a first input of a switch 2297. An output of theintra prediction module 2260 is connected in signal communication with asecond input of the switch 2297. An output of the switch 2297 isconnected in signal communication with a first non-inverting input ofthe combiner 2225.

An input of the input buffer 2210 is available as an input of thedecoder 2200, for receiving an input bitstream. A first output of thedeblocking filter 2265 is available as an output of the decoder 2200,for outputting an output picture.

Referring to FIG. 3, a structure for a single-layer SPS 300 is shown.SPS is a syntax structure that generally contains syntax elements thatapply to zero or more entire coded video sequences. In the SVCextension, the values of some syntax elements conveyed in the SPS arelayer dependent. These layer-dependent syntax elements include but arenot limited to, the timing information, HRD (standing for “HypotheticalReference Decoder”) parameters, and bitstream restriction information.HRD parameters may include, for example, indicators of buffer size,maximum bit rate, and initial delay. HRD parameters may allow areceiving system, for example, to verify the integrity of a received bitstream and/or to determine if the receiving system (for example, adecoder) can decode the bit stream. Therefore, a system may provide forthe transmission of the aforementioned syntax elements for each layer.

The single-layer SPS 300 includes an SPS-ID 310 that provides anidentifier for the SPS. The single-layer SPS 300 also includes the VUI(standing for Video Usability Information) parameters 320 for a singlelayer. The VUI parameters include the HRD parameters 330 for a singlelayer, such as, for example, the base layer. The single-layer SPS 300also may include additional parameters 340, although implementationsneed not include any additional parameters 340.

Referring to FIG. 4, a block view of a data stream 400 shows a typicaluse of the single-layer SPS 300. In the AVC standard, for example, atypical data stream may include, among other components, an SPS unit,multiple PPS (picture parameter sequence) units providing parameters fora particular picture, and multiple units for encoded picture data. Sucha general framework is followed in FIG. 4, which includes the SPS 300, aPPS-1 410, one or more units 420 including encoded picture-1 data, aPPS-2 430, and one or more units 440 including encoded picture-2 data.The PPS-1 410 includes parameters for the encoded picture-1 data 420,and the PPS-2 430 includes parameters for the encoded picture-2 data440.

The encoded picture-1 data 420, and the encoded picture-2 data 440, areeach associated with a particular SPS (the SPS 300 in the implementationof FIG. 4). This is achieved through the use of pointers, as nowexplained. The encoded picture-1 data 420 includes a PPS-ID (not shown)identifying the PPS-1 410, as shown by an arrow 450. The PPS-ID may bestored in, for example, a slice header. The encoded picture-2 data 440includes a PPS-ID (not shown) identifying the PPS-2 430, as shown by anarrow 460. The PPS-1 410 and the PPS-2 430 each include an SPS-ID (notshown) identifying the SPS 300, as shown by arrows 470 and 480respectively.

Referring to FIG. 5, a structure for a SUP SPS 500 is shown. SUP SPS 500includes an SPS ID 510, a VUI 520 that includes HRD parameters 530 for asingle additional layer referred to by “(D2, T2, Q2)”, and optionaladditional parameters 540. “D2, T2, Q2” refers to a second layer havingspatial (D) level 2, temporal (T) level 2, and quality (Q) level 2.

Note that various numbering schemes may be used to refer to layers. Inone numbering scheme, base layers have a D, T, Q of 0, x, 0, meaning aspatial level of zero, any temporal level, and a quality level of zero.In that numbering scheme, enhancement layers have a D, T, Q in which Dor Q are greater than zero.

The use of SUP SPS 500 allows, for example, a system to use an SPSstructure that only includes parameters for a single layer, or that doesnot include any layer-dependent information. Such a system may create aseparate SUP SPS for each additional layer beyond the base layer. Theadditional layers can identify the SPS with which they are associatedthrough the use of the SPS ID 510. Clearly several layers can share asingle SPS by using a common SPS ID in their respective SUP SPS units.

Referring to FIG. 6, an organizational hierarchy 600 is shown among anSPS unit 605 and multiple SUP SPS units 610 and 620. The SUP SPS units610 and 620 are shown as being single-layer SUP SPS units, but otherimplementations may use one or more multiple-layer SUP SPS units inaddition to, or in lieu of, single-layer SUP SPS units. The hierarchy600 illustrates that, in a typical scenario, multiple SUP SPS units maybe associated with a single SPS unit. Implementations may, of course,include multiple SPS units, and each of the SPS units may haveassociated SUP SPS units.

Referring to FIG. 7, a structure for another SUP SPS 700 is shown. SUPSPS 700 includes parameters for multiple layers, whereas SUP SPS 500includes parameters for a single layer. SUP SPS 700 includes an SPS ID710, a VUI 720, and optional additional parameters 740. The VUI 720includes HRD parameters 730 for a first additional layer (D2, T2, Q2),and for other additional layers up to layer (Dn, Tn, Qn).

Referring again to FIG. 6, the hierarchy 600 may be modified to use amultiple layer SUP SPS. For example, the combination of the SUP SPS 610and 620 may be replaced with the SUP SPS 700 if both the SUP SPS 610 and620 include the same SPS ID.

Additionally, the SUP SPS 700 may be used, for example, with an SPS thatincludes parameters for a single layer, or that includes parameters formultiple layers, or that does not include layer-dependent parameters forany layers. The SUP SPS 700 allows a system to provide parameters formultiple layers with little overhead.

Other implementations may be based, for example, on an SPS that includesall the needed parameters for all possible layers. That is, the SPS ofsuch an implementation includes all the corresponding spatial (D_(i)),temporal (T_(i)), and quality (Q_(i)) levels that are available to betransmitted, whether all the layers are transmitted or not. Even withsuch a system, however, a SUP SPS may be used to provide an ability tochange the parameters for one or more layers without transmitting theentire SPS again.

Referring to Table 1, syntax is provided for a specific implementationof a single-layer SUP SPS. The syntax includes sequence_parameter_set_idto identify the associated SPS, and the identifiers of temporal_level,dependency_id, and quality_level to identify a scalable layer. The VUIparameters are included through the use of svc_vui_parameters( )(seeTable 2), which includes HRD parameters through the use ofhrd_parameters( ). The syntax below allows each layer to specify its ownlayer-dependent parameters, such as, for example, HRD parameters.

TABLE 1 sup_seq_parameter_set_svc ( ) { C Descriptor sequence_parameter_set_id 0 ue(v)  temporal_level 0 u(3)  dependency_id0 u(3)  quality_level 0 u(2)  vui_parameters_present_svc_flag 0 u(1) if( vui_parameters_present_svc_flag )   svc_vui_parameters( ) }The semantics for the syntax of sup_seq_parameter_set_svc( ) is asfollows.

-   -   sequence_parameter_set_id identifies the sequence parameter set        which the current SUP SPS maps to for the current layer;    -   temporal_level, dependency_id, and quality_level specify the        temporal level, dependency identifier, and quality level for the        current layer. Dependency_id generally indicates spatial level.        However, dependency_id also is used to indicate the Coarse Grain        Scalability (“CGS”) hierarchy, which includes both spatial and        SNR scalability, with SNR scalability being a traditional        quality scalability. Accordingly, quality_level and        dependency_id may both be used to distinguish quality_levels.    -   vui_parameters_present_svc_flag equals to 1 specifies that        svc_vui_parameters( ) syntax structure as defined below is        present. vui_parameters_present_svc_flag equals to 0 specifies        that svc_vui_parameters( ) syntax structure is not present.

Table 2 gives the syntax for svc_vui_parameters( ). The VUI parametersare therefore separated for each layer and put into individual SUP SPSunits. Other implementations, however, group the VUI parameters formultiple layers into a single SUP SPS.

TABLE 2 svc_vui_parameters( ) { C Descriptor  timing_info_present_flag 0u(1)  If( timing_info_present_flag ) {   num_units_in_tick 0 u(32)  time_scale 0 u(32)   fixed_frame_rate_flag 0 u(1)  } nal_hrd_parameters_present_flag 0 u(1)  If(nal_hrd_parameters_present_flag )   hrd_parameters( ) vcl_hrd_parameters_present_flag 0 u(1)  If(vcl_hrd_parameters_present_flag )   hrd_parameters( )  If(nal_hrd_parameters_present_flag ||  vcl_hrd_parameters_present_flag )  low_delay_hrd_flag 0 u(1)  pic_struct_present_flag 0 u(1) bitstream_restriction_flag 0 u(1)  If( bitstream_restriction_flag ) {  motion_vectors_over_pic_boundaries_flag 0 u(1)  max_bytes_per_pic_denom 0 ue(v)   max_bits_per_mb_denom 0 ue(v)  log2_max_mv_length_horizontal 0 ue(v)   log2_max_mv_length_vertical 0ue(v)   num_reorder_frames 0 ue(v)   max_dec_frame_buffering 0 ue(v)  }}The fields of the svc_vui_parameters( ) syntax of Table 2 are defined inthe version of the SVC extension that existed in April 2007 underJVT-U201 annex E E.1. In particular, hrd_parameters( ) is as defined forthe AVC standard. Note also that svc_vui_parameters( ) includes variouslayer-dependent information, including HRD-related parameters. TheHRD-related parameters include num_units_in_tick, time_scale,fixed_frame_rate_flag, nal_hrd_parameters_present_flag,vcl_hrd_parameters_present_flag, hrd_parameters( ), low_delay_hrd_flag,and pic_struct_present_flag. Further, the syntax elements in thebitsream_restriction_flag if-loop are layer-dependent even though notHRD-related.

As mentioned above, the SUP SPS is defined as a new type of NAL unit.Table 3 lists some of the NAL unit codes as defined by the standardJVT-U201, but modified to assign type 24 to the SUP SPS. The ellipsisbetween NAL unit types 1 and 16, and between 18 and 24, indicate thatthose types are unchanged. The ellipsis between NAL unit types 25 and 31means that those types are all unspecified. The implementation of Table3 below changes type 24 of the standard from “unspecified” to“sup_seq_parameter_set_svc( )”. “Unspecified” is generally reserved foruser applications. “Reserved”, on the other hand, is generally reservedfor future standard modifications. Accordingly, another implementationchanges one of the “reserved” types (for example, type 16, 17, or 18) to“sup_seq_parameter_set_svc( )”. Changing an “unspecified” type resultsin an implementation for a given user, whereas changing a “reserved”type results in an implementation that changes the standard for allusers.

TABLE 3 Content of nal_unit_type NAL unit and RBSP syntax structure C 0Unspecified 1 Coded slice of a non-IDR picture 2, 3, 4slice_layer_without_partitioning_rbsp( ) . . . . . . . . . 16-18Reserved . . . . . . 24  sup_seq_parameter_set_svc( ) 25 . . . 31Unspecified

FIG. 8 shows a functional view of an implementation of a scalable videocoder 800 that generates SUP SPS units. A video is received at the inputof the scalable video coder 1. The video is coded according to differentspatial levels. Spatial levels mainly refer to different levels ofresolution of the same video. For example, as the input of a scalablevideo coder, one can have a CIF sequence (352 per 288) or a QCIFsequence (176 per 144) which represent each one spatial level.

Each of the spatial levels is sent to an encoder. The spatial level 1 issent to an encoder 2″, the spatial level 2 is sent to an encoder 2′, andthe spatial level m is sent to an encoder 2.

The spatial levels are coded with 3 bits, using the dependency_id.Therefore, the maximum number of spatial levels in this implementationis 8.

The encoders 2, 2′, and 2″ encode one or more layers having theindicated spatial level. The encoders 2, 2′, and 2″ may be designed tohave particular quality levels and temporal levels, or the qualitylevels and temporal levels may be configurable. As can be seen from FIG.8, the encoders 2, 2′, and 2″ are hierarchically arranged. That is, theencoder 2″ feeds the encoder 2′, which in turn feeds the encoder 2. Thehierarchical arrangement indicates the typical scenario in which higherlayers use a lower layer(s) as a reference.

After the coding, the headers are prepared for each of the layers. Inthe implementation shown, for each spatial level, an SPS message, a PPSmessage, and multiple SUP_SPS messages are created. SUP SPS messages (orunits) may be created, for example, for layers corresponding to thevarious different quality and temporal levels.

For spatial level 1, SPS and PPS 5″ are created and a set of SUP_SPS,SUP_SPS₂ ¹, . . . , SUP_SPS_(n*O) ¹ are also created.

For spatial level 2, SPS and PPS 5′ are created and a set of SUP_SPS₁ ²,SUP_SPS₂ ², . . . , SUP_SPS_(n*O) ² are also created.

For spatial level m, SPS and PPS 5 are created and a set of SUP_SPS₁^(m), SUP_SPS₂ ^(m), . . . , SUP_SPS_(n*O) ^(m) are also created.

The bitstreams 7, 7′, and 7″ encoded by the encoders 2, 2′, and 2″,typically follow the plurality of SPS, PPS, and SUP_SPS (also referredto as headers, units, or messages) in the global bitstream.

A bitstream 8″ includes SPS and PPS 5″, SUP_SPS₁ ¹, SUP_SPS₂ ¹, . . . ,SUP_SPS_(n*O) ¹ 6″, and encoded video bitstream 7″, which constitute allthe encoded data associated with spatial level 1.

A bitstream 8′ includes SPS and PPS 5′, SUP_SPS₁ ², SUP_SPS₂ ², . . . ,SUP_SPS_(n*O) ² 6′, and encoded video bitstream 7′, which constitute allthe encoded data associated with spatial level 2.

A bitstream 8 includes SPS and PPS 5, SUP_SPS₁ ^(m), SUP_SPS₂ ^(m), . .. , SUP_SPS_(n*O) ^(m) 6, and encoded video bitstream 7, whichconstitute all the encoded data associated with spatial level m.

The different SUP_SPS headers are compliant with the headers describedin Tables 1-3.

The encoder 800 depicted in FIG. 8 generates one SPS for each spatiallevel. However, other implementations may generate multiple SPS for eachspatial level or may generate an SPS that serves multiple spatiallevels.

The bitstreams 8, 8′, and 8″ are combined in a multiplexer 9 whichproduces an SVC bitstream, as shown in FIG. 8.

Referring to FIG. 9, a hierarchical view 900 illustrates the generationof a data stream that contains SUP SPS units. The view 900 may be usedto illustrate the possible bitstreams generated by the scalable videoencoder 800 of FIG. 8. The view 900 provides an SVC bitstream to atransmission interface 17.

The SVC bitstream may be generated, for example, according to theimplementation of FIG. 8, and comprises one SPS for each of the spatiallevels. When m spatial levels are encoded, the SVC bitstream comprisesSPS1, SPS2 and SPSm represented by 10, 10′ and 10″ in FIG. 9.

In the SVC bitstream, each SPS codes the general information relative tothe spatial level. The SPS is followed by a header 11, 11′, 11″, 13,13′, 13″, 15, 15′, and 15″ of SUP_SPS type. The SUP_SPS is followed bythe corresponding encoded video data 12, 12′, 12″, 14, 14′, 14″, 16,16′, and 16″ which each correspond to one temporal level (n) and onequality level (O).

Therefore, when one layer is not transmitted, the corresponding SUP_SPSis also not transmitted. This is because there is typically one SUP_SPSheader corresponding to each layer.

Typical implementations use a numbering scheme for layers in which thebase layer has a D and Q of zero. If such a numbering scheme is used forthe view 900, then the view 900 does not explicitly show a base layer.That does not preclude the use of a base layer. Additionally, however,the view 900 may be augmented to explicitly show a bitstream for a baselayer, as well as, for example, a separate SPS for a base layer.Further, the view 900 may use an alternate numbering scheme for baselayers, in which one or more of the bitstreams (1, 1, 1) through (m, n,O) refers to a base layer.

Referring to FIG. 10, a block view is provided of a data stream 1000generated by the implementation of FIGS. 8 and 9. FIG. 10 illustratesthe transmission of the following layers:

-   -   Layer (1, 1, 1): spatial level 1, temporal level 1, quality        level 1; which includes transmission of blocks 10, 11, and 12;    -   Layer (1, 2, 1): spatial level 1, temporal level 2, quality        level 1; which includes the additional transmission of blocks        11′ and 12′;    -   Layer (2, 1, 1): spatial level 2, temporal level 1, quality        level 1; which includes the additional transmission of blocks        10′, 13, and 14;    -   Layer (3, 1, 1) spatial level 3, temporal level 1, quality level        1; which includes the additional transmission of blocks 10″, 15,        and 16;    -   Layer (3, 2, 1): spatial level 3, temporal level 2, quality        level 1; which includes the additional transmission of blocks        15′ and 16′;    -   Layer (3, 3, 1): spatial level 3, temporal level 3, quality        level 1; which includes the additional transmission of blocks        15″ and 16″.

The block view of the data stream 1000 illustrates that SPS 10 is onlysent once and is used by both Layer (1, 1, 1) and Layer (1, 2, 1), andthat SPS 10″ is only sent once is used each of Layer (3, 1, 1), Layer(3, 2, 1), and Layer (3, 3, 1). Further, the data stream 1000illustrates that the parameters for all of the layers are nottransmitted, but rather only the parameters corresponding to thetransmitted layers. For example, the parameters for layer (2, 2, 1),corresponding to SUP_SPS₂ ², are not transmitted because that layer isnot transmitted. This provides an efficiency for this implementation.

Referring to FIG. 11, an encoder 1100 includes an SPS generation unit1110, a video encoder 1120, and a formatter 1130. The video encoder 1120receives input video, encodes the input video, and provides the encodedinput video to the formatter 1130. The encoded input video may include,for example, multiple layers such as, for example, an encoded base layerand an encoded enhancement layer. The SPS generation unit 1110 generatesheader information, such as, for example, SPS units and SUP SPS units,and provides the header information to the formatter 1130. The SPSgeneration unit 1110 also communicates with the video encoder 1120 toprovide parameters used by the video encoder 1120 in encoding the inputvideo.

The SPS generation unit 1110 may be configured, for example, to generatean SPS NAL unit. The SPS NAL unit may include information that describesa parameter for use in decoding a first-layer encoding of a sequence ofimages. The SPS generation unit 1110 also may be configured, forexample, to generate a SUP SPS NAL unit having a different structurethan the SPS NAL unit. The SUP SPS NAL unit may include information thatdescribes a parameter for use in decoding a second-layer encoding of thesequence of images. The first-layer encoding and the second-layerencoding may be produced by the video encoder 1120.

The formatter 1130 multiplexes the encoded video from the video encoder1120, and the header information from the SPS generation unit 1110, toproduce an output encoded bitstream. The encoded bitstream may be a setof data that includes the first-layer encoding of the sequence ofimages, the second-layer encoding of the sequence of images, the SPS NALunit, and the SUP SPS NAL unit.

The components 1110, 1120, and 1130 of the encoder 1100 may take manyforms. One or more of the components 1110, 1120, and 1130 may includehardware, software, firmware, or a combination, and may be operated froma variety of platforms, such as, for example, a dedicated encoder or ageneral processor configured through software to function as an encoder.

FIGS. 8 and 11 may be compared. The SPS generation unit 1110 maygenerate the SPS and the various SUP_SPS_(n*O) ^(m) shown in FIG. 8. Thevideo encoder 1120 may generate the bitstreams 7, 7′, and 7″ (which arethe encodings of the input video) shown in FIG. 8. The video encoder1120 may correspond, for example, to one or more of the encoders 2, 2′,or 2″. The formatter 1130 may generate the hierarchically arranged datashown by reference numerals 8, 8′, 8″, as well as perform the operationof the multiplexer 9 to generate the SVC bitstream of FIG. 8.

FIGS. 1 and 11 also may be compared. The video encoder 1120 maycorrespond, for example, to blocks 104 and 187 of FIG. 1. The formatter1130 may correspond, for example, to the multiplexer 170. The SPSgeneration unit 1110 is not explicitly shown in FIG. 1 although thefunctionality of the SPS generation unit 1110 may be performed, forexample, by the multiplexer 170.

Other implementations of encoder 1100 do not include the video encoder1120 because, for example, the data is pre-encoded. The encoder 1100also may provide additional outputs and provide additional communicationbetween the components. The encoder 1100 also may be modified to provideadditional components which may, for example, be located betweenexisting components.

Referring to FIG. 12, an encoder 1200 is shown that operates in the samemanner as the encoder 1100. The encoder 1200 includes a memory 1210 incommunication with a processor 1220. The memory 1210 may be used, forexample, to store the input video, to store encoding or decodingparameters, to store intermediate or final results during the encodingprocess, or to store instructions for performing an encoding method.Such storage may be temporary or permanent.

The processor 1220 receives input video and encodes the input video. Theprocessor 1220 also generates header information, and formats an encodedbitstream that includes header information and encoded input video. Asin the encoder 1100, the header information provided by the processor1220 may include separate structures for conveying header informationfor multiple layers. The processor 1220 may operate according toinstructions stored on, or otherwise resident on or part of, forexample, the processor 1220 or the memory 1210.

Referring to FIG. 13, a process 1300 is shown for encoding input video.The process 1300 may be performed by, for example, either of theencoders 1100 or 1200.

The process 1300 includes generating an SPS NAL unit (1310). The SPS NALunit includes information that describes a parameter for use in decodingthe first-layer encoding of the sequence of images. The SPS NAL unit maybe defined by a coding standard or not. If the SPS NAL unit is definedby a coding standard, then the coding standard may require a decoder tooperate in accordance with received SPS NAL units. Such a requirement isgenerally referred to by stating that the SPS NAL unit is “normative”.SPS, for example, are normative in the AVC standard, whereassupplemental enhancement information (“SEI”) messages, for example, arenot normative. Accordingly, AVC-compatible decoders may ignore receivedSEI messages but must operate in accordance with received SPS.

The SPS NAL unit includes information describing one or more parametersfor decoding a first layer. The parameter may be, for example,information that is layer-dependent, or is not layer-dependent. Examplesof parameters that are typically layer-dependent include a VUI parameteror an HRD parameter.

Operation 1310 may be performed, for example, by the SPS generation unit1110, the processor 1220, or the SPS and PPS Inserter 2140. Theoperation 1310 also may correspond to the generation of SPS in any ofblocks 5, 5′, 5″ in FIG. 8.

Accordingly, a means for performing the operation 1310, that is,generating an SPS NAL unit, may include various components. For example,such means may include a module for generating SPS 5, 5′, or 5″, anentire encoder system of FIG. 1, 8, 11, or 12, an SPS generation unit1110, a processor 1220, or an SPS and PPS Inserter 2140, or theirequivalents including known and future-developed encoders.

The process 1300 includes generating a supplemental (“SUP”) SPS NAL unithaving a different structure than the SPS NAL unit (1320). The SUP SPSNAL unit includes information that describes a parameter for use indecoding the second-layer encoding of the sequence of images. The SUPSPS NAL unit may be defined by a coding standard, or not. If the SUP SPSNAL unit is defined by a coding standard, then the coding standard mayrequire a decoder to operate in accordance with received SUP SPS NALunits. As discussed above with respect to operation 1310, such arequirement is generally referred to by stating that the SUP SPS NALunit is “normative”.

Various implementations include normative SUP SPS messages. For example,SUP SPS messages may be normative for decoders that decode more than onelayer (for example, SVC-compatible decoders). Such multi-layer decoders(for example, SVC-compatible decoders) would be required to operate inaccordance with the information conveyed in SUP SPS messages. However,single-layer decoders (for example, AVC-compatible decoders) couldignore SUP SPS messages. As another example, SUP SPS messages may benormative for all decoders, including single-layer and multi-layerdecoders. It is not surprising that many implementations includenormative SUP SPS messages, given that SUP SPS messages are based inlarge part on SPS messages, and that SPS messages are normative in theAVC standard and the SVC and MVC extensions. That is, SUP SPS messagescarry similar data as SPS messages, serve a similar purpose as SPSmessages, and may be considered to be a type of SPS message. It shouldbe clear that implementations having normative SUP SPS messages mayprovide compatibility advantages, for example, allowing AVC and SVCdecoders to receive a common data stream.

The SUP SPS NAL unit (also referred to as the SUP SPS message) includesone or more parameters for decoding a second layer. The parameter maybe, for example, information that is layer-dependent, or is notlayer-dependent. Specific examples include a VUI parameter or an HRDparameter. The SUP SPS may also be used for decoding the first layer, inaddition to being used for decoding the second layer.

Operation 1320 may be performed, for example, by the SPS generation unit1110, the processor 1220, or a module analogous to the SPS and PPSInserter 2140. The operation 1320 also may correspond to the generationof SUP_SPS in any of blocks 6, 6′, 6″ in FIG. 8.

Accordingly, a means for performing the operation 1320, that is,generating a SUP SPS NAL unit, may include various components. Forexample, such means may include a module for generating SUP_SPS 6, 6′,or 6″, an entire encoder system of FIG. 1, 8, 11, or 12, an SPSgeneration unit 1110, a processor 1220, or a module analogous to the SPSand PPS Inserter 2140, or their equivalents including known andfuture-developed encoders.

The process 1300 includes encoding a first-layer encoding, such as, forexample, the base layer, for a sequence of images, and encoding asecond-layer encoding for the sequence of images (1330). These encodingsof the sequence of images produce the first-layer encoding and thesecond-layer encoding. The first-layer encoding may be formatted into aseries of units referred to as first-layer encoding units, and thesecond-layer encoding may be formatted into a series of nits referred toas second-layer encoding units. The operation 1330 may be performed, forexample, by the video encoder 1120, the processor 1220, the encoders 2,2′, or 2″ of FIG. 8, or the implementation of FIG. 1.

Accordingly, a means for performing the operation 1330, may includevarious components. For example, such means may include an encoder 2,2′, or 2″, an entire encoder system of FIG. 1, 8, 11, or 12, a videoencoder 1120, a processor 1220, or one or more core encoders 187(possibly including decimation module 104), or their equivalentsincluding known and future-developed encoders.

The process 1300 includes providing a set of data (1340). The set ofdata includes the first-layer encoding of the sequence of images, thesecond-layer encoding of the sequence of images, the SPS NAL unit, andthe SUP SPS NAL unit. The set of data may be, for example, a bitstream,encoded according to a known standard, to be stored in memory ortransmitted to one or more decoders. Operation 1340 may be performed,for example, by the formatter 1130, the processor 1220, or themultiplexer 170 of FIG. 1. Operation 1340 may also be performed in FIG.8 by the generation of any of the bitstreams 8, 8′, and 8″, as well asthe generation of the multiplexed SVC bitstream.

Accordingly, a means for performing the operation 1340, that is,providing a set of data, may include various components. For example,such means may include a module for generating the bitstream 8, 8′, or8″, a multiplexer 9, an entire encoder system of FIG. 1, 8, 11, or 12, aformatter 1130, a processor 1220, or a multiplexer 170, or theirequivalents including known and future-developed encoders.

The process 1300 may be modified in various ways. For example, operation1330 may be removed from the process 1300 in implementations in which,for example, the data is pre-encoded. Further, in addition to removingoperation 1330, operation 1340 may be removed to provide a processdirected toward generating description units for multiple layers.

Referring to FIG. 14, a data stream 1400 is shown that may be generated,for example, by the process 1300. The data stream 1400 includes aportion 1410 for an SPS NAL unit, a portion 1420 for a SUP SPS NAL unit,a portion 1430 for the first-layer encoded data, and a portion 1440 forthe second-layer encoded data. The first-layer encoded data 1430 is thefirst-layer encoding, which may be formatted as first-layer encodingunits. The second-layer encoded data 1440 is the second-layer encoding,which may be formatted as second-layer encoding units. The data stream1400 may include additional portions which may be appended after theportion 1440 or interspersed between the portions 1410-1440.Additionally, other implementations may modify one or more of theportions 1410-1440.

The data stream 1400 may be compared to FIGS. 9 and 10. The SPS NAL unit1410 may be, for example, any of the SPS1 10, the SPS2 10′, or the SPSm10″. The SUP SPS NAL unit 1420 may be, for example, any of the SUP_SPSheaders 11, 11′, 11″, 13, 13′, 13″, 15, 15′, or 15″. The first-layerencoded data 1430 and the second-layer encoded data 1440 may be any ofthe bitstreams for the individual layers shown as Bitstream of Layer (1,1, 1) 12 through (m, n, O) 16″, and including the bitstreams 12, 12′,12″, 14, 14′, 14″, 16, 16′, and 16″. It is possible for the first-layerencoded data 1430 to be a bitstream with a higher set of levels than thesecond-layer encoded data 1440. For example, the first-layer encodeddata 1430 may be the Bitstream of Layer (2, 2, 1) 14′, and thesecond-layer encoded data 1440 may be the Bitstream of Layer (1, 1, 1)12.

An implementation of the data stream 1400 may also correspond to thedata stream 1000. The SPS NAL unit 1410 may correspond to the SPS module10 of the data stream 1000. The SUP SPS NAL unit 1420 may correspond tothe SUP_SPS module 11 of the data stream 1000. The first-layer encodeddata 1430 may correspond to the Bitstream of Layer (1, 1, 1) 12 of thedata stream 1000. The second-layer encoded data 1440 may correspond tothe Bitstream of Layer (1, 2, 1) 12′ of the data stream 1000. TheSUP_SPS module 11′ of the data stream 1000 may be interspersed betweenthe first-layer encoded data 1430 and the second-layer encoded data1440. The remaining blocks (10′-16″) shown in the data stream 1000 maybe appended to the data stream 1400 in the same order shown in the datastream 1000.

FIGS. 9 and 10 may suggest that the SPS modules do not include anylayer-specific parameters. Various implementations do operate in thismanner, and typically require a SUP_SPS for each layer. However, otherimplementations allow the SPS to include layer-specific parameters forone or more layers, thus allowing one or more layers to be transmittedwithout requiring a SUP_SPS.

FIGS. 9 and 10 suggest that each spatial level has its own SPS. Otherimplementations vary this feature. For example, other implementationsprovide a separate SPS for each temporal level, or for each qualitylevel. Still other implementations provide a separate SPS for eachlayer, and other implementations provide a single SPS that serves alllayers.

Referring to FIG. 15, a decoder 1500 includes a parsing unit 1510 thatreceives an encoded bitstream, such as, for example, the encodedbitstream provided by the encoder 1100, the encoder 1200, the process1300, or the data stream 1400. The parsing unit 1510 is coupled to adecoder 1520.

The parsing unit 1510 is configured to access information from an SPSNAL unit. The information from the SPS NAL unit describes a parameterfor use in decoding a first-layer encoding of a sequence of images. Theparsing unit 1510 is further configured to access information from a SUPSPS NAL unit having a different structure than the SPS NAL unit. Theinformation from the SUP SPS NAL unit describes a parameter for use indecoding a second-layer encoding of the sequence of images. As describedabove in conjunction with FIG. 13, the parameters may be layer-dependentor non-layer-dependent.

The parsing unit 1510 provides parsed header data as an output. Theheader data includes the information accessed from the SPS NAL unit andalso includes the information accessed from the SUP SPS NAL unit. Theparsing unit 1510 also provides parsed encoded video data as an output.The encoded video data includes the first-layer encoding and thesecond-layer encoding. Both the header data and the encoded video dataare provided to the decoder 1520.

The decoder 1520 decodes the first-layer encoding using the informationaccessed from the SPS NAL unit. The decoder 1520 also decodes thesecond-layer encoding using the information accessed from the SUP SPSNAL unit. The decoder 1520 further generates a reconstruction of thesequence of images based on the decoded first-layer and/or the decodedsecond-layer. The decoder 1520 provides a reconstructed video as anoutput. The reconstructed video may be, for example, a reconstruction ofthe first-layer encoding or a reconstruction of the second-layerencoding.

Comparing FIGS. 15, 2, and 2 a, the parsing unit 1510 may correspond,for example, to the demultiplexer 202, and/or one or more of the entropydecoders 204, 212, 222, or 2245, in some implementations. The decoder1520 may correspond, for example, to the remaining blocks in FIG. 2.

The decoder 1500 also may provide additional outputs and provideadditional communication between the components. The decoder 1500 alsomay be modified to provide additional components which may, for example,be located between existing components.

The components 1510 and 1520 of the decoder 1500 may take many forms.One or more of the components 1510 and 1520 may include hardware,software, firmware, or a combination, and may be operated from a varietyof platforms, such as, for example, a dedicated decoder or a generalprocessor configured through software to function as a decoder.

Referring to FIG. 16, a decoder 1600 is shown that operates in the samemanner as the decoder 1500. The decoder 1600 includes a memory 1610 incommunication with a processor 1620. The memory 1610 may be used, forexample, to store the input encoded bitstream, to store decoding orencoding parameters, to store intermediate or final results during thedecoding process, or to store instructions for performing a decodingmethod. Such storage may be temporary or permanent.

The processor 1620 receives an encoded bitstream and decodes the encodedbitstream into a reconstructed video. The encoded bitstream includes,for example, (1) a first-layer encoding of a sequence of images, (2) asecond-layer encoding of the sequence of images, (3) an SPS NAL unithaving information that describes a parameter for use in decoding thefirst-layer encoding, and (4) a SUP SPS NAL unit having a differentstructure than the SPS NAL unit, and having information that describes aparameter for use in decoding the second-layer encoding.

The processor 1620 produces the reconstructed video based on at leastthe first-layer encoding, the second-layer encoding, the informationfrom the SPS NAL unit, and the information from the SUP SPS NAL unit.The reconstructed video may be, for example, a reconstruction of thefirst-layer encoding or a reconstruction of the second-layer encoding.The processor 1620 may operate according to instructions stored on, orotherwise resident on or part of, for example, the processor 1620 or thememory 1610.

Referring to FIG. 17, a process 1700 is shown for decoding an encodedbitstream. The process 1700 may be performed by, for example, either ofthe decoders 1500 or 1600.

The process 1700 includes accessing information from an SPS NAL unit(1710). The accessed information describes a parameter for use indecoding a first-layer encoding of a sequence of images.

The SPS NAL unit may be as described earlier with respect to FIG. 13.Further, the accessed information may be, for example, an HRD parameter.Operation 1710 may be performed, for example, by the parsing unit 1510,the processor 1620, an entropy decoder 204, 212, 222, or 2245, ordecoder control 2205. Operation 1710 also may be performed in areconstruction process at an encoder by one or more components of anencoder.

Accordingly, a means for performing the operation 1710, that is,accessing information from an SPS NAL unit, may include variouscomponents. For example, such means may include a parsing unit 1510, aprocessor 1620, a single-layer decoder, an entire decoder system of FIG.2, 15, or 16, or one or more components of a decoder, or one or morecomponents of encoders 800, 1100, or 1200, or their equivalentsincluding known and future-developed decoders and encoders.

The process 1700 includes accessing information from a SUP SPS NAL unithaving a different structure than the SPS NAL unit (1720). Theinformation accessed from the SUP SPS NAL unit describes a parameter foruse in decoding a second-layer encoding of the sequence of images.

The SUP SPS NAL unit may be as described earlier with respect to FIG.13. Further, the accessed information may be, for example, an HRDparameter. Operation 1720 may be performed, for example, by the parsingunit 1510, the processor 1620, an entropy decoder 204, 212, 222, or2245, or decoder control 2205. Operation 1720 also may be performed in areconstruction process at an encoder by one or more components of anencoder.

Accordingly, a means for performing the operation 1720, that is,accessing information from a SUP SPS NAL unit, may include variouscomponents. For example, such means may include a parsing unit 1510, aprocessor 1620, a demultiplexer 202, an entropy decoder 204, 212, or222, a single-layer decoder, or an entire decoder system 200, 1500, or1600, or one or more components of a decoder, or one or more componentsof encoders 800, 1100, or 1200, or their equivalents including known andfuture-developed decoders and encoders.

The process 1700 includes accessing a first-layer encoding and asecond-layer encoding for the sequence of images (1730). The first-layerencoding may have been formatted into first-layer encoding units, andthe second-layer encoding may have been formatted into second-layerencoding units. Operation 1730 may be performed, for example, by theparsing unit 1510, the decoder 1520, the processor 1620, an entropydecoder 204, 212, 222, or 2245, or various other blocks downstream ofthe entropy decoders. Operation 1730 also may be performed in areconstruction process at an encoder by one or more components of anencoder.

Accordingly, a means for performing the operation 1730 may includevarious components. For example, such means may include a parsing unit1510, a decoder 1520, a processor 1620, A demultiplexer 202, an entropydecoder 204, 212, or 222, a single-layer decoder, a bitstream receiver,a receiving device, or an entire decoder system 200, 1500, or 1600, orone or more components of a decoder, or one or more components ofencoders 800, 1100, or 1200, or their equivalents including known andfuture-developed decoders and encoders.

The process 1700 includes generating a decoding of the sequence ofimages (1740). The decoding of the sequence of images may be based onthe first-layer encoding, the second-layer encoding, the accessedinformation from the SPS NAL unit, and the accessed information from theSUP SPS NAL unit. Operation 1740 may be performed, for example, by thedecoder 1520, the processor 1620, or various blocks downstream ofdemultiplexer 202 and input buffer 2210. Operation 1740 also may beperformed in a reconstruction process at an encoder by one or morecomponents of an encoder.

Accordingly, a means for performing the operation 1740 may includevarious components. For example, such means may include a decoder 1530,a processor 1620, a single-layer decoder, an entire decoder system 200,1500, or 1600, or one or more components of a decoder, an encoderperforming a reconstruction, or one or more components of encoders 800,1100, or 1200, or their equivalents including known and future-developeddecoders or encoders.

Another implementation performs an encoding method that includesaccessing first layer-dependent information in a first normativeparameter set. The accessed first layer-dependent information is for usein decoding a first-layer encoding of a sequence of images. The firstnormative parameter set may be, for example, an SPS that includesHRD-related parameters or other layer-dependent information. However,the first normative parameter set need not be an SPS and need not berelated to an H.264 standard.

In addition to the first parameter set being normative, which requires adecoder to operate in accordance with the first parameter set if such aparameter set is received, the first parameter set may also be requiredto be received in an implementation. That is, an implementation mayfurther require that the first parameter set be provided to a decoder.

The encoding method of this implementation further includes accessingsecond layer-dependent information in a second normative parameter set.The second normative parameter set has a different structure than thefirst normative parameter set. Also, the accessed second layer-dependentinformation is for use in decoding a second-layer encoding of thesequence of images. The second normative parameter set may be, forexample, a supplemental SPS. The supplemental SPS has a structure thatis different from, for example, an SPS. The supplemental SPS alsoincludes HRD parameters or other layer-dependent information for asecond layer (different from the first layer).

The encoding method of this implementation further includes decoding thesequence of images based on one or more of the accessed firstlayer-dependent information or the accessed second layer-dependentinformation. This may include, for example, decoding a base layer or anenhancement layer.

Corresponding apparatuses are also provided in other implementations,for implementing the encoding method of this implementation. Suchapparatuses include, for example, programmed encoders, programmedprocessors, hardware implementations, or processor-readable media havinginstructions for performing the encoding method. The systems 1100 and1200, for example, may implement the encoding method of thisimplementation.

Corresponding signals, and media storing such signals or the data ofsuch signals, are also provided. Such signals are produced, for example,by an encoder that performs the encoding method of this implementation.

Another implementation performs a decoding method analogous to the aboveencoding method. The decoding method includes generating a firstnormative parameter set that includes first layer-dependent information.The first layer-dependent information is for use in decoding afirst-layer encoding of a sequence of images. The decoding method alsoincludes generating a second normative parameter set having a differentstructure than the first normative parameter set. The second normativeparameter set includes second layer-dependent information for use indecoding a second-layer encoding of the sequence of images. The decodingmethod further includes providing a set of data including the firstnormative parameter set and the second normative parameter set.

Corresponding apparatuses are also provided in other implementations,for implementing the above decoding method of this implementation. Suchapparatuses include, for example, programmed decoders, programmedprocessors, hardware implementations, or processor-readable media havinginstructions for performing the decoding method. The systems 1500 and1600, for example, may implement the decoding method of thisimplementation.

Note that the term “supplemental”, as used above, for example, inreferring to “supplemental SPS” is a descriptive term. As such,“supplemental SPS” does not preclude units that do not include the term“supplemental” in the unit name. Accordingly, and by way of example, acurrent draft of the SVC extension defines a “subset SPS” syntaxstructure, and the “subset SPS” syntax structure is fully encompassed bythe descriptive term “supplemental”. So that the “subset SPS” of thecurrent SVC extension is one implementation of a SUP SPS as described inthis disclosure.

Implementations may use other types of messages in addition to, or as areplacement for, the SPS NAL units and/or the SUP SPS NAL units. Forexample, at least one implementations generates, transmits, receives,accesses, and parses other parameter sets having layer-dependentinformation.

Further, although SPS and supplemental SPS have been discussed largelyin the context of H.264 standards, other standards also may include SPS,supplemental SPS, or variations of SPS or supplemental SPS. Accordingly,other standards (existing or future-developed) may include structuresreferred to as SPS or supplemental SPS, and such structures may beidentical to or be variations of the SPS and supplemental SPS describedherein. Such other standards may, for example, be related to currentH.264 standards (for example, an amendment to an existing H.264standard), or be completely new standards. Alternatively, otherstandards (existing or future-developed) may include structures that arenot referred to as SPS or supplemental SPS, but such structures may beidentical to, analogous to, or variations of the SPS or supplemental SPSdescribed herein.

Note that a parameter set is a set of data including parameters. Forexample, an SPS, a PPS, or a supplemental SPS.

In various implementations, data is said to be “accessed”. “Accessing”data may include, for example, receiving, storing, transmitting, orprocessing data.

Various implementations are provided and described. Theseimplementations can be used to solve a variety of problems. One suchproblem arises when multiple interoperability points (IOPs) (alsoreferred to as layers) need different values for parameters that aretypically carried in the SPS. There is no adequate method to transmitthe layer dependent syntax elements in the SPS for different layershaving the same SPS identifier. It is problematic to send separate SPSdata for each such layer. For example, in many existing systems a baselayer and its composite temporal layers share the same SPS identifier.

Several implementations provide a different NAL unit type forsupplemental SPS data. Thus, multiple NAL units may be sent, and eachNAL unit may include supplemental SPS information for a different SVClayer, but each NAL unit may be identified by the same NAL unit type.The supplemental SPS information may, in one implementation, be providedin the “subset SPS” NAL unit type of the current SVC extension.

It should be clear that the implementations described in this disclosureare not restricted to the SVC extension or to any other standard. Theconcepts and features of the disclosed implementations may be used withother standards that exist now or are developed in the future, or may beused in systems that do not adhere to any standard. As one example, theconcepts and features disclosed herein may be used for implementationsthat work in the environment of the MVC extension. For example, MVCviews may need different SPS information, or SVC layers supported withinthe MVC extension may need different SPS information. Additionally,features and aspects of described implementations may also be adaptedfor yet other implementations. Accordingly, although implementationsdescribed herein may be described in the context SPS for SVC layers,such descriptions should in no way be taken as limiting the features andconcepts to such implementations or contexts.

The implementations described herein may be implemented in, for example,a method or process, an apparatus, or a software program. Even if onlydiscussed in the context of a single form of implementation (forexample, discussed only as a method), the implementation of featuresdiscussed may also be implemented in other forms (for example, anapparatus or program). An apparatus may be implemented in, for example,appropriate hardware, software, and firmware. The methods may beimplemented in, for example, an apparatus such as, for example, aprocessor, which refers to processing devices in general, including, forexample, a computer, a microprocessor, an integrated circuit, or aprogrammable logic device. Processors also include communicationdevices, such as, for example, computers, cell phones, portable/personaldigital assistants (“PDAs”), and other devices that facilitatecommunication of information between end-users.

Implementations of the various processes and features described hereinmay be embodied in a variety of different equipment or applications,particularly, for example, equipment or applications associated withdata encoding and decoding. Examples of equipment include video coders,video decoders, video codecs, web servers, set-top boxes, laptops,personal computers, cell phones, PDAs, and other communication devices.As should be clear, the equipment may be mobile and even installed in amobile vehicle.

Additionally, the methods may be implemented by instructions beingperformed by a processor, and such instructions may be stored on aprocessor-readable medium such as, for example, an integrated circuit, asoftware carrier or other storage device such as, for example, a harddisk, a compact diskette, a random access memory (“RAM”), or a read-onlymemory (“ROM”). The instructions may form an application programtangibly embodied on a processor-readable medium. Instructions may be,for example, in hardware, firmware, software, or a combination.Instructions may be found in, for example, an operating system, aseparate application, or a combination of the two. A processor may becharacterized, therefore, as, for example, both a device configured tocarry out a process and a device that includes a computer readablemedium having instructions for carrying out a process.

As will be evident to one of skill in the art, implementations mayproduce a variety of signals formatted to carry information that may be,for example, stored or transmitted. The information may include, forexample, instructions for performing a method, or data produced by oneof the described implementations. For example, a signal may be formattedto carry as data the rules for writing or reading the syntax of adescribed embodiment, or to carry as data the actual syntax-valueswritten by a described embodiment. Such a signal may be formatted, forexample, as an electromagnetic wave (for example, using a radiofrequency portion of spectrum) or as a baseband signal. The formattingmay include, for example, encoding a data stream and modulating acarrier with the encoded data stream. The information that the signalcarries may be, for example, analog or digital information. The signalmay be transmitted over a variety of different wired or wireless links,as is known.

A number of implementations have been described. Nevertheless, it willbe understood that various modifications may be made. For example,elements of different implementations may be combined, supplemented,modified, or removed to produce other implementations. Additionally, oneof ordinary skill will understand that other structures and processesmay be substituted for those disclosed and the resulting implementationswill perform at least substantially the same function(s), in at leastsubstantially the same way(s), to achieve at least substantially thesame result(s) as the implementations disclosed. Accordingly, these andother implementations are contemplated by this application and arewithin the scope of the following claims.

1. An apparatus comprising a processor-readable medium includinginstructions stored on the processor-readable medium for performing atleast the following: accessing information from a sequence parameter set(“SPS”) network abstraction layer (“NAL”) unit, the informationdescribing a parameter for use in decoding a first-layer encoding of animage in a sequence of images; accessing supplemental information from asupplemental SPS NAL unit having an available NAL unit type code that isa different NAL unit type code from that of the SPS NAL unit, and havinga different syntax structure than the SPS NAL unit, and the supplementalinformation from the supplemental SPS NAL unit describing a parameterfor use in decoding a second-layer encoding of the image in the sequenceof images; and decoding the first-layer encoding, and the second-layerencoding, based on, respectively, the accessed information from the SPSNAL unit, and the accessed supplemental information from thesupplemental SPS NAL unit.
 2. The apparatus of claim 1 wherein theparameter for use in decoding the second-layer encoding comprises avideo usability information (“VUI”) parameter.
 3. The apparatus of claim2 wherein the parameter for use in decoding the first-layer encodingcomprises a HRD parameter.